length 1
Approximation of the Proximal Operator of the $\ell_\infty$ Norm Using a Neural Network
Computing the proximal operator of the $\ell_\infty$ norm, $\textbf{prox}_{\alpha ||\cdot||_\infty}(\mathbf{x})$, generally requires a sort of the input data, or at least a partial sort similar to quicksort. In order to avoid using a sort, we present an $O(m)$ approximation of $\textbf{prox}_{\alpha ||\cdot||_\infty}(\mathbf{x})$ using a neural network. A novel aspect of the network is that it is able to accept vectors of varying lengths due to a feature selection process that uses moments of the input data. We present results on the accuracy of the approximation, feature importance, and computational efficiency of the approach. We show that the network outperforms a "vanilla neural network" that does not use feature selection. We also present an algorithm with corresponding theory to calculate $\textbf{prox}_{\alpha ||\cdot||_\infty}(\mathbf{x})$ exactly, relate it to the Moreau decomposition, and compare its computational efficiency to that of the approximation.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
- North America > United States > Maryland > Prince George's County > College Park (0.04)
- (2 more...)
Are emergent abilities of large language models a mirage? – Interview with Brando Miranda
Rylan Schaeffer, Brando Miranda, and Sanmi Koyejo won a NeurIPS 2023 outstanding paper award for their work Are Emergent Abilities of Large Language Models a Mirage?. In their paper, they present an alternative explanation for emergent abilities in large language models. We spoke to Brando about this work, their alternative theory, and what inspired it. This is a good and hard question to answer cleanly because the word emergence has been around in science for a while. For example, in physics, when you reach a certain number of uranium atoms you can make a bomb, but with fewer than that you can't.
Making use of supercomputers in financial machine learning
Cotte, Philippe, Lagier, Pierre, Margot, Vincent, Geissler, Christophe
This article is the result of a collaboration between Fujitsu and Advestis. This collaboration aims at refactoring and running an algorithm based on systematic exploration producing investment recommendations on a high-performance computer of the Fugaku type [11], to see whether a very high number of cores could allow for a deeper exploration of the data compared to a cloud machine, hopefully resulting in better predictions. We found that an increase in the number of explored rules results in a net increase in the predictive performance of the final ruleset. Also, in the particular case of this study, we found that using more than around 40 cores does not bring a significant computation time gain. However, the origin of this limitation is explained by a threshold-based search heuristic used to prune the search space. We have evidence that for similar data sets with less restrictive thresholds, the number of cores actually used could very well be much higher, allowing parallelization to have a much greater effect.